Google map plan
Table of Contents
1. Intruduction
As I was scraping leads for my job hunt, still hunting by the way, I wanted to visualize how businesses are distributed, and little did I know, I found some businesses are located in the sea.
Here is an example:
SpeedTalk® Mobile, a business with more then 18K ratings.
After scraping more data from Google map, processing it, I was able to make this plot that shows, as someone from the community called it: "The distribution of Geo-coding errors"
If you need the graph separately: https://josephelhaddad.github.io/plotly/b_in_sea2
The data set can be found here:
https://drive.google.com/drive/folders/1rCXC7h1kgVbcUA0Bu5yXj4NGUbqst2Cl?usp=sharing
2. Quick stats
Coordinates distribution
Coordinates Count Percentage % 46.423669 , -129.9427086 97897 73.44 WWWWWWWWWWWW 0 , 0 5702 4.27 h 27.698638 , -83.804601 4149 3.11 ! 37.878638 , -122.4203375 3326 2.49 c 37.7848269 , -122.7073054 1030 0.77 . 33.8256055 , -118.641338 673 0.50 14.1576412 , -106.6918595 562 0.42 40.419584 , -73.6754126 427 0.32 41.7993125 , -70.3086624 335 0.25 34.032332 , -119.134398 325 0.24 States distribution
State Count Percentage % California 25124 18.85 WWWWWWWWWWWW Florida 19333 14.50 WWWWWWWWV New York 10294 7.72 WWWH Texas 9491 7.12 WWWc Ohio 5052 3.79 W Pennsylvania 4058 3.04 ! Georgia 3729 2.80 ; Massachusetts 3543 2.66 : Missouri 3234 2.43 Illinois 3182 2.39 Business types distribution
Business type Count Percentage % Marketing agency 12282 9.21 WWWWWWWWWWWW Marketing consultant 3607 2.70 WWh Internet marketing service 3130 2.34 WW. Interior designer 2649 1.98 Wl House cleaning service 1656 1.24 l Website designer 1562 1.17 ! Electrician 1426 1.06 ; Construction company 1300 0.97 : Tutoring service 1186 0.88 . Painter 1139 0.85
3. Further investigation
Some members suggests that [46.423669, -129.9427086] could be the [0, 0] (The Null island) of the US, the same way Switzerland have it own anchor.
The [46.423669, -129.9427086] point is to the left of the USA, can it be the center?
To answer this I looked for the 4 most extreme points of the US territories using lists from Wikipedia https://en.wikipedia.org/wiki/List_of_extreme_points_of_the_United_States
- These are the extreme points:
- Northernmost - Utqiagvik, Alaska: 71.290556, -156.788611
- Southernmost - Rose Atoll: -14.546667, -168.151944
- Westernmost - Point Udall (Guam): 13.447556, 144.618194
- Easternmost - Point Udall (U.S. Virgin Islands): 17.755833, -64.566944
- Northernmost - Utqiagvik, Alaska: 71.290556, -156.788611
I made an "area" out of the values [71, -14, 144, -64], and turns out, that [46.423669, -129.9427086] is in the center, at least horizontally.
If you need the graph separately: https://josephelhaddad.github.io/plotly/b_in_sea3_orthographic
We might be able to draw a quick conclusions:
- The clusters we see are probably the coordinates the devs use when invalid coordinates are inputted.
- The randomly placed business locations, that often single, are probably intentional.